Building And Sharing Multilingual Speech Resources Using ERIM Generic Platforms
نویسنده
چکیده
In the framework of projects ChinFaDial and ERIM we have developed in recent years several platforms allowing to handle various aspects of bilingual spoken dialogues on the web —mainly, spontaneous speech corpus collection through distant human interpreting. Current development of the core ERIM-Interp and ERIM-Collect platforms now includes multimodal user interaction, integration of some machine aids (such as speech turn logs through speech recognition, or tentatively speech machine translation, both based on server-grounded market products), and next, online aids to speakers and/or interpreters. First collected data should be made available on the web in fall 2004 (DistribDial) along with, as soon as available, a robust version of the collecting platform, in order to promote collaborative building, and sharing, of "raw" unannotated multilingual speech corpora. A variant of the ERIM environment is to extend to distant e-training in interpreting, possibly creating situations which should in turn, in our view, foster larger-scale data collection and sharing in open access mode.
منابع مشابه
Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment
We describe here the three main platforms in the ERIM family of Web-based environments for human interpreting, two of them in more details, ERIM-Interp and ERIM-Collect, then ERIM-Aid. Each platform supports an aspect of the collecting or study of spontaneous bilingual dialogues, translated by an interpreter. ERIM-Interp is the core environment, providing mediated communication between speakers...
متن کاملBuilding Synthetic Voices in the META-NET Framework
METANET4U is a European project aiming at supporting language technology for European languages and multilingualism. It is a project in the META-NET Network of Excellence, a cluster of projects aiming at fostering the mission of META, which is the Multilingual Europe Technology Alliance, dedicated to building the technological foundations of a multilingual European information society. This pap...
متن کاملMultilingual Grammar Resources in Multilingual Application Development
Grammar development makes up a large part of the multilingual rule-based application development cycle. One way to decrease the required grammar development efforts is to base the systems on multilingual grammar resources. This paper presents a detailed description of a parametrization mechanism used for building multilingual grammar rules. We show how these rules, which had originally been des...
متن کاملAdaptation techniques for speech synthesis in under-resourced languages
This paper presents techniques for building speech synthesizers targeted at limited data scenarios limited data from a target speaker; limited or no data in a target language. A resource sharing strategy within speakers and languages is presented giving promising directions for under-resourced languages. Our results show the importance of the amount of training data, the selection of languages ...
متن کاملUsing Multilingual Resources for Building SloWNet Faster
This project report presents the results of an approach in which synsets for Slovene wordnet were induced automatically from parallel corpora and already existing wordnets. First, multilingual lexicons were obtained from word-aligned corpora and compared to the wordnets in various languages in order to disambiguate lexicon entries. Then appropriate synset ids were attached to Slovene entries fr...
متن کامل